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Abstract. The Bcthc approximation is a successful method for approximating 
partition functions of probabilistic models associated with a graph. Recently, Chertkov 
and Chernyak derived an interesting formula called "Loop Series Expansion" , which 
is an expansion of the partition function. The main term of the series is the Bethe 
approximation while other terms are labelled by subgraphs called generalized loops. 
In this paper, we derive a loop series expansion of binary pairwise Markov random 
fields with "propagation diagrams", which describe rules how "first messages" and 
"secondary messages" propagate. Our approach allows to express the loop series in 
the form of a polynomial with coefficients positive integers. Using the propagation 
diagrams, we establish a new formula that shows a relation between the exact marginal 
CNJ ' probabilities and their Bethe approximations. 
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1. Introduction 

@ A Markov random field (MRF) associated with a graph is given by a joint probability 
distribution over a set of variables. In the associated graph, the nodes represent variables 
and the edges represent probabilistic dependence between variables. A typical example 
of a MRF is a Gibbs distribution of the Ising model on a finite lattice. The joint 
distribution is often given in an unnormalized form, and the normalization factor of a 
MRF is called a partition function. 

The main topic of this paper is computation of the partition function and the 
marginal distributions of a MRF with discrete variables. This problem is in general 
computationally intractable for a large number of variables, and some approximation 
method is required. Among many approximation methods, the Bethe approximation 
PQ has attracted renewed interest of computer scientists; it is equivalent to Loopy 
Belief Propagation (LBP) algorithm [21 [3], which has been successfully used for many 
applications such as error correcting codes, inference on graphs, image processing, and 
so on IHEIE]. 
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Chertkov and Chernyak [7J [5] give a new and interesting formula called loop series 
expansion, which expresses the partition function in terms of a finite series. The first 
term is the Bethe approximation, and the others are labelled by so-called generalized 
loops. The Bethe approximation can be corrected with this formula, though summing 
up all the terms requires computational efforts exponential to the number of linearly 
independent cycles. 

In this paper we propose an alternative diagram-based method for deriving the 
loop series expansion formula. In our approach, we define secondary messages, which 
are orthogonal to the messages used in the LBP algorithm, and show that they satisfy 
a set of rules as they propagate. For each node and edge, we associate parameters {7^} 
and {Pij}, respectively; 7$ is related to the approximated marginal of a node i, and 
to the approximated correlation of adjacent nodes % and j. The loop series is represented 
by a polynomial of these variables with coefficients positive integers. This positivity is 
useful for deriving a bound on the number of generalized loops. 

The main result of this paper is theorem HI with which we can calculate the true 
marginal probabilities in terms of the beliefs at the convergence of the LBP, {7«}, and 
{Pij}- The terms in the formula of the marginals depend on the topological structure 
of the graph. 

This paper is organized as follows. In section 2, we briefly review the definition 
of pairwise MRF, the Bethe approximation and the LBP algorithm. In section 3, we 
characterize the Bethe approximation as the fixed points of the LBP and deduce the 
fixed point equation in theorem [U In section 4, we define first and secondary messages, 
and study their propagation rules. These rules are fundamental tools for our analysis. In 
section 5, we derive the loop series formula, and compare it with the results of Chertkov 
and Chernyak. We deduce consequences of our representation of the expansion: the 
connection to the partition function of the Ising model (with uniform coupling constant 
and external field) and the upper bound on the number of generalized loops. In section 
6, we prove an expansion formula for the true marginal probability, and provide some 
examples. 

2. Bethe approximation and loopy belief propagation algorithm 

2.1. Pairwise Markov random field 

We introduce a probabilistic model considered in this paper, MRF of binary states 
with pairwise interactions. Let G := {V,E) be a connected undirected graph, where 
V — {1, . . . , iV} is a set of nodes and E C {(i, j); 1 < i < j < N} is a set of undirected 
edges. Each node i G V is associated with a binary space \i = {il}- We make a set of 
directed edges from E by E = {(i,j), (j,i)', (i,j) € E}. The neighbours of i is denoted 
by N(i) C V, and di = \N(i)\ is called the degree of i. A joint probability distribution 
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on the graph G is given by the form: 




(1) 



ijeE iev 



where ipij(xi,Xj) : x% x Xj ~> ^>o and fa : Xi ~ * ^>o are positive functions called 
compatibility functions. The normalization factor Z is called the partition function. A 
set of random variables which has a probability distribution in the form of ([I]) is called 
a Markov random field (MRF) or an undirected graphical model on the graph G. This 
class of probability distributions is equivalent to the Ising model with arbitrary coupling 
constants and local magnetic fields. In traditional literatures of statistical physics, a 
graph G is often given by an infinite lattice, but as per recent interest, especially in 
computer science, G has an arbitrary topology with finite nodes. 

Without loss of generality, univariate compatibility functions fa can be neglected 
because they can be included in bivariate compatibility functions ipij. This operation 
does not affect the Bethe approximation and the LBP algorithm given below; we assume 
it as per the following. 

2.2. Loopy belief propagation algorithm 

The LBP algorithm computes the Bethe approximation of the partition function and 
the marginal distribution of each node with the message passing method [91 [2j [3] . This 
algorithm is summarized as follows. 

(i) Initialization: 

For all (J, i) G E, the message from i to j is a vector G M 2 . Initialize as 



until it converges. Finally we obtain { m \j i)} (j i) & E- 
(iii) Approximated marginals and the partition function are computed by the following 



m (j,i)( x j) = 1 Vs i e Xj- 
(ii) Message Passing: 

For each t — 0, 1, . . ., update the messages by 



(2) 




(3) 



formulas: 



bi(xi) :=uj j [ m* (iJ) (xi), 



(4) 




(5) 




(6) 
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Figure 1. A new node 

can be added on an edge. Figure 2. An example of G and G. 



where uj are appropriate normalization constants, bi are called beliefs, and Zb is 
called the Bethe approximation of the partition function. 

In step (jn]), there is ambiguity as to the order of updating the messages. We do not 
specify the order, because the fixed points of LBP algorithm do not depend on its choice. 
Note that this LBP algorithm does not necessarily converge, and there may be more 
than one fixed points unless the interactions are sufficiently weak |lUj . 

3. Fixed point equation of LBP 

When LBP converges, any converged messages {mf- satisfy a certain equation 

shown in theorem [TJ By this theorem we show that the converged messages can be 
normalized simultaneously and we define {fJ>(j,i)}(j,i) - called first messages. 

3.1. Graph operations 

First, we remark that we can always add a new node without changing the marginals 
and beliefs of the others. For an edge (i,j), we can add a node k between i and j as in 
figure [[] with new compatibility functions ipik,?pkj satisfying 

ipij{x i: Xj) = y^ y ipik(xi,x k )ij; kj (x k , Xj). (7) 

This operation will be used implicitly many times in this paper. Adding new nodes, if 
necessary, we can always assume that "there are sufficiently many nodes of degree two" . 

Next we define a graph G by G. Let L := \E\ — \V\ + 1, the number of linearly 
independent cycles. Cutting and duplicating L nodes of degree two appropriately, we 
obtain a connected tree G, since we assume that there are sufficiently many nodes of 
degree two [H]. See figure [2 Renumbering the nodes of V, we assume that the cut nodes 
are numbered by {1, . . . , L}. We define G = (V, E) by Vi = {1, . . . , L} U {I, . . . , L}, 
V2 = {L + l, . . . , N} and V = Vi U V\. E is also naturally defined. We call V\ leaf nodes. 



3.2. Belief propagation equations 

Using the converged messages {m^j}, we define messages coming into the leaf nodes 
of the graph G as follows. Let s be a cut node with the neighbour N(s) = {u,v} in G, 
and let (s,v), (s,u) G E be the edges at the duplicated nodes We define // s oc mJ 8U \, 
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Us oc m* s ^ and normalize them by J2 Xs / J 's(xs)^s(x s ) = 1- Generalizations of the transfer 
matrices are defined by 

v" 1 £ n iM**.*;). (8) 

xi, + i,...,xjv 

L 

%s = ^ y ^1 1 'xi,.. L ,x L f[ ^m(^ra) ^fh(Xm) ■ (9) 

a; 1 ,...,a; s _ 1 , a: j , . . . ,x s - 1 , m = l 

Since {"^j)} is the convergence point of the LBP algorithm, it is easy to see that there 
is a s > that satisfies a s fi s (x s ) = J2 Xs > T s Xs Is /i s (^')- The following theorem states that 
all a s are equal to Zb- 

Theorem 1 . for s = 1 , . . . , L 

z b /j, s (x s ) = £t/ s Xs /j, s (x s ) x- s e Xs, (10) 

Xs 

Z B ^s{Xs) = £T/ S Xt fJL S (x S ) X s E Xs- (11) 

Proof. From a s fi s (x s ) = , T s Xs Xs [i s {x s '), a s and a s are the Perron-Frobenius 
eigenvalues of the matrix T s Xs x and its transpose, respectively. Therefore a s = a s 
for 1 < s < L. Next, we prove a s = ol\ for 1 < s, I < L. As we normalize 

J2x s Vs{Xs)Hs{Xs) = 1, 

Xs 

L 

— T 1, xi,.. L ,x L fornix m )fim(Xfn) 

xi...,xl xj...,x^ m=l 
= «Z, 

which shows a\ = a s = a. Finally we prove a = Zb- By dividing some ip function by 
a from the first, we can assume that a = 1. Then it is sufficient to prove \ogZ B = 0. 
By distributing the messages {fi s } se v 1 on the tree G from its leaf nodes V\ without 

— * ^ 

normalization, we define a message /i^^ at each edge (j, i) G E. Because G is a tree, 
the messages are uniquely defined step by step from the leaf nodes. By the 

assumption a — 1, we have fi s = /i( s ,u)- Therefore, 

Hjd x i) = ^A x i> x i) II t x (i,s)( x i), v (j,i)eE. (12) 

xiexi seN(i)\{j} 

By the relation 

L 

Xi j£N(i) x\...,xl xi...,xl m=l 
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we obtain 

h{xi) = J [ fi{ij)(xi), (14) 
jeN(i) 

bji{xj,Xi) =tp j i(x j ,x i ) Yi ^(hs)( x i) } I Hi,s')(xi). (15) 

seN(j)\{i} s'eN(i)\{j] 

The assertion follows from putting (fT4"j) and f|T5|) into the definition of Zb ©. □ 

As used in the proof of above theorem, the normalization Zb = 1 is convenient; in 
the rest of this paper we assume the following. 

Assumption. By normalizing one of {ipij}, we assume Zb = 1. 

In the proof of the above theorem, we defined the messages fi^^ oc m*^ * satisfying 
(fT2|l . (fl4l and (TTBT) . We call fj,^ (normalized) first messages. While these conditions are 
similar to (jSJ),© and (j3J), an important difference is disappearance of the normalization 
constants u. By the above assumption, the first messages satisfy (JT2"]h Notice that we 
define u on the graph G, not only on the graph G. 

This equation is a generalization of recursive expression for calculating free energy 
of the Bethe lattice [12]. For each fixed point of the LBP algorithm {m^j}, there is a 
solution of equation ffTUj) . ffTTT) . On the other hand, for each solution of ffTUj) and ffTTj) . 
there is a LBP fixed point. 

4. Propagation diagrams 

We proved in the previous section, if we normalize '^2 Xs ^ s (x s )fis(x s ) = 1, the Bethe- 
approximated partition function is given by 

L 

z B = E ^%^ L ]I^M^mM, (is) 

x\...,xl Xf...,X£ m=l 

while the true partition function is 

z = E T "* L = E E tx1, ^l n ( 17 ) 

xi...,xl xi...,Xj^ m=l 

Let us define vectors v s and i/ s for s = 1, . . . , L as to satisfy 

^2fi s (x s )u s (x s ) = 0, ^/i 5 (x s )z/ s (x s ) = 0, ^2^ s (x s )u s (x s ) = 1. (18) 

X$ %s 

Then, we have a decomposition of the unit matrix 

5 Xs ,xs = l^ s (x s )n s (xs) + v s (x s )Us(xs). (19) 

We can expand f|T7|) using (fT9l) in a sum of 2 L terms. The first term is obviously the Bethe 
approximated partition function (fT6l) . But the explicit form of the remaining 2 L — 1 
terms is not obvious. In this section we define secondary messages {v(i, j)}^ j\ e g and 
derive rules, which describe how these messages propagate, for deriving the remaining 
terms. 
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4-1. Definition of secondary messages 

By splitting nodes as in figure EJ any graphs can be transformed so that every node is of 
degree at most three. We make the compatibility functions of new edges infinitely strong: 
ijjij(xi, Xj) = S XijX .. This change does not affect the fixed points of LBP algorithm, hence 
the Bethe approximation. In subsections 14.11 and 14.21 we assume that the graph G has 
undergone this transformation. The same transformation also appears in [13] and |14j . 
The first messages \x are defined on this transformed graph. 

We define "secondary messages" {^(jj)} on the transformed graph using {//(jj)}. 
First, let j be a node of degree two such that at least one adjacent node, denoted by k, 
has degree two (figure Hj). We define v at the node j by the following conditions: 

Y v UA( x i)»ij*)( x i) = °> Y V[j,i)( x 3)v{hk){xj) = 0, (20) 

Xj Xj 

Y u uM x Mj,k)( x j) = L ( 21 ) 

Xj 

Similarly, let j be a node of degree two such that at least one adjacent node, denoted 
by i, has degree two (figure ED- We define v at node j by the following conditions: 

Y u uM x 3)^(m( x j)^uM x j) = °> Y ^uM x 3) u u,k)( x j)^(m( x j) = °> 

Xj Xj 

Y^(jM x j)^u,k)( x j) u (jM x j) = °» ( 22 ) 

Xj 

Y u uM x i) u m( x j)v(j,i)( x i) = x ' Y vuM x i) u m( x j) u w)( x i) = h 

Xj Xj 

Y v (j^ x j)vm( x i) v m( x j) = L ( 23 ) 

Xj 

These conditions determine u^a uniquely up to a scalar factor at nodes of degree two, 
and up to sign at nodes of degree three. We assume that the first component is 
negative without loss of generality. The above relations and YlieNN) f J, (j,i)( x j) = 1 
(fTBl are pictorially summarized in figure El These cases are sufficient because we can 
add a node of degree two at any edge if necessary. We call such diagrams propagation 
diagrams. The blue dashed and the light red arrows express /1 and u, respectively. 



Figure 3. A node of degree n can be split into n — 2 nodes of degree 3. 




Figure 4. Node j is degree 2. 



Figure 5. Node j is degree 3, node i is degree 2. 
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Figure 6. Rules at degree 2 and 3 nodes. 

A condition which is similar to ( 1201) and ( 1221) is imposed in [15] to deduce the update 
rule of the LBP algorithm, though they consider a graphical model with variables on 
the edges. 

4-2. Propagation rules 

The rules depicted in figure |6] describe what happens when messages collide at a node. 
The next two lemmas show how first and secondary messages propagate. 

Lemma 1. See figure^ Suppose nodes j and k are of degree 2. Then, there is f3jk € R 
such that 

PjkV(j,k)(xj) = ^2?Pjk(xj,Xk)v(k,i)(xk), PjkV(k,j)(xk) = ^2ip k j{xkiXj)v(j,i){xj)- (24) 

Proof. Using <^2) and (EDI), Ex ;A V<jM x iW(jM x 3> x k) u (wM = S* fc V(kj)(xk)v(k,i)(xk) 
0. Hence ^ x ^jk{xjiXk) v {k,i){ x k) oc v (j,k){xj)- The proportion is equal in the both di- 
rections (j, k) and (k,j) because of the condition (|2~TT) . □ 

Next we proceed to a degree three node. 

Lemma 2. See figure^ Suppose node j is of degree three andi is of degree two. Then, 
there is (3ji G R such that 

PjiV{j,t){xj) = y]y>ji(a;j,St)y(t,io(st), (25) 

Xi 

PjiV(i,j)( X i) = Yl ^ji( X ^ X i) U uM X j)^U,k)( X j)^ ( 26 ) 

Xj 

PjiV(ij)(xi) = y~] i/ijijxj, Xijii^jx^p^jxj). (27) 

Xj 

Proof. We can show in the same way as the previous lemma. □ 

These lemmas say that the secondary messages v propagate with rate /3 for both 
directions, though the first messages \i propagate without variation in scales. Equations 
( 1261) and ( 1271) hold when the adjacent nodes % and j have degree three as in figure El 
We associate numbers /3y for all undirected edges in E. Propagation diagrams of these 
results are summarized in figure [TJ 
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Figure 8. Node j and i have degree three. 

4-2.1. Two more rules In addition to the rules in figure El we have to give a rule for 
the case in which three secondary messages v collide at a node. 

Lemma 3. Let N(j) = {i,k,l}, then 

Yl u ud x MiM x Mid x j) = /, , ,\, , M +) - h A~)) =■ ir ( 28 ) 
Xi v b A+) b A-) 

Proof. We need direct computation of the message vectors. By the orthogonal condition 
( 1221) . V(j t i), V(j,k) and vyj) are determined up to scalar factors. The scales are determined 
by fl23l) . Since we assume the first components of v are negative, v must have the 
following form: 



W^) = - x i\\- — ru7, — r~n^ — — r^s^uM-^md-^h ( 29 ) 

From (|T4j) the result follows. □ 

We use the next lemma when we split a node as in figure [3] 

Lemma 4. See figure^ If the nodes j and i are of degree three and tpij(xi,Xj) = S XijX ., 
then Pij = 1. 

Proof. By (1271) , fl29|) and the remark after lemma [2], 

PijV(i,j){xi) = ^2^ij(x i ,x j )fi m (x j )u m (x j ) 

Xj 

= V{j,l)( X i) U U,k)( X i) 



^ / /^(3,fc)(l)/^(j,fc)(-l)^(j,0(l)^(3,0(-l) ff / \ 



Since (xi) = ^(^(a^^fo) and = /^(xj)/^)^), we can show 

= 1 by using Q22D for (z, j). □ 
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4-3. At general nodes 

We have defined v on the transformed graph Gt = (Vr, Er) in which the degree of 
every node is at most three. By shrinking added edges, we define v on the original 
graph G = (V,E). In other words the injection E — > Ex induces the messages on the 
original graph from the transformed graph. At each node of the original graph, we show 
that the following theorem holds. This theorem generalizes the rules in figure [6] and 
lemma EJ 

Theorem 2. Let j G V and N(j) = . . . , i<f,}- Then, we have 

n dj-n 

fndj) = x^n z/ o'^)( :r i) n / / o>«+ t )( a; i)' ( 3 °) 

Xj s=l i=l 

where {/n(^)}^Lo a se t of polynomials defined inductively by the relations fo{x) = 
l,f 1 (x) = and f n+ i{x) = xf n (x) + f n -i{x). 

Proof. We can reduce to the case of n = dj by splitting node j and using the 
propagation rules in figure [3 See figure [9] for example. The proof is done by 
induction; the cases of n = 1,2,3 is obtained by definition. In the case of n 
we split the node j into j' and j" where 2 and n — 2 secondary messages join at 
j 1 and j", respectively. Adding a node k between j' and j", we use the relation 
V-{k,j'){xk)H(k,j"){xk) + V{k,j'){xk)v{k,j"){xk) = o~x k ,xk'- % lemma H and the induction 
hypothesis, we obtain f n (jj) = f 2 {lj)f n -2{lj) + hil^fn-iilj)- The assertion follows 
from the definition of the polynomials f n . Figure fTUl illustrate this procedure. □ 

It is surprising that the right hand side is determined by the value jj which depends 
only on the belief bj, while the messages fM(j ; i s ) and vui s \ are not determined by by 

In this section we have derived a set of rules. In addition to the diagrams to show the 
rules, diagrams in which these rules are successively applied are also called propagation 
diagrams. 




Figure 9. Reduction to the case of n = dj. 




Figure 10. Derivation of the inductive relation in the case of n = 4. 
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4-4- In the case of one dimensional systems 

In the case of one dimensional spin systems, results of section 3 and 4 are reduced to 
easy observations. Let the graph G be a cycle of length N. We cut the node 1 as 
discussed in section 3.1 to obtain a string. The partition function is 

z= T % ( 31 ) 

XX=±1 

where the transfer matrix T = 7\ is defined by (JSJ) and (Q. Therefore Z is equal to 
the sum of the first and second eigenvalues of the matrix T. By theorem [J, the first 
messages fx\ and /xj are the first right and left eigenvectors of the transfer matrix, where 
the first eigenvalue is Zb- By lemma [lj we see that V\ and v\ are the second right and left 
eigenvectors, and the second eigenvalue is Zb W fiji- The conditions of (120]) are regarded 
as orthogonality of eigenvectors. The results of these sections are a generalization to the 
transfer tensor associated with more complicated graphs with nodes of degree three. 



5. Loop series expansion formula 

5.1. Derivation of loop series expansion formula 

Let G = (V, E) be a graph, and D be a subset of the edge-set E. An edge-induced 
subgraph of D is a subgraph of the graph G whose edge-set is D and whose node-set 
consists of all nodes that are incident with at least one edge in D. 

We are now ready to prove the expansion formula of the partition function. 

Theorem 3. For each solution of (LT2]] ; define {/%}, {7i} by the above lemmas. 
Then the following expansion formula holds. 

Z = Z B (1+ r(&), (32) 

faces 

r(C):= J] ^UUioili), 

ij£E c i£V c 

Where Q is the set of all edge-induced subgraphs of G and C = (Vc, Eq). 

Proof. Adding a node k on each edge of G, we expand the partition function 

by the relation fi(k,i)(x k )fi {kJ )(x k ') + V( k;i) (x k )v( k)j )(xk) = 5 XkiXh ,. We have 2 |i?l terms 
in this expansion, because for each edge (i,j) there is a choice: /^^(x^/i^j^x//) or 
I/ (fc,i)(xA;)^(fcj)(xfc')- ^ we regard the ^-edges as a edge-set, each term can be identified 
with an edge-induced graph. By theorem [2] and lemma [1] and [2j the term corresponding 
to C is equal to r{C). □ 

Since fi(x) = 0, C makes a contribution to the sum only if C does not have a node 
of degree one in C. Such C is called a generalized loop. In the case that G is a tree, 
there is no generalized loop, therefore Z = Zb- This is well-known [9]. 

Note that / n (0) = 1 if n is even, and / n (0) = if n is odd. If 7« = for all i 6 V, 
only generalized loops with even degrees contribute to the sum. This is reminiscent of 
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the high temperature expansion of the Ising model without magnetic field. We further 
discuss this point in section 15.31 In addition, if the graph is planar, these terms are 
summed by a single Pfaffian [13], E] . 

We can choose and expand some of the edges step by step with propagation 
diagrams as in Appendix A.l while we expand at all edges in the above proof. 

Let ©({/%}, {7i}) := 1 + X^ceg r (C) ■ This is a polynomial of indeterminates 
{Pij} and {7i}, and its coefficients are positive integers, because f n (x) are polynomials 
with coefficients positive integers. We can assign other quantities to nodes and edges 
such as {nii} and {t^} discussed later in section 15.21 and represent the formula in 
different manner with these quantities, but such assignment may cause large positive 
and negative coefficients. Advantage of using 7 and (3 is that the coefficients are not 
huge because they are all positive and the total sum is determined by L as discussed 
later in 15.4.21 Note that the method of propagation diagrams gives an algorithm for 
computing 9. 

The messages \i and v are explicitly given if we change the compatibility functions. 
By the definitions of the beliefs (jl]) and ([5]), we see that 

II Tpijfa,*]) <* II h n( x ^ x j)\\ h , Tdpr ( 33 ) 

Therefore, we can retake compatibility functions as 



b i {x i ) { - di - 1),di b j {x j ) {di ~ IJ/ ' / ' 



1pij{Xi,Xj) s(tL-lV<b, / s(tL-\Mdi (34) 



without changing the joint probability distribution. Moreover, this does not cause any 
change to the result of the LBP algorithm, namely beliefs. In the rest of section 15.11 
and section 15721 we assume this representation of compatibility functions. By ( Tl2l) . for 
all (j, i) E E the first messages have the following forms: fiu^Xj) = . The 

secondary messages are determined by theorem [2] as 

v ud x i) = bj ^ + Yd--2)/2d J J b ^_Yd J -2 ) /2d, (j^)^ E - (35) 

Using lemma [1] and lemma [2] on the transformed graph, we see that 

b-(x- x) 4-1 
&i = 7 ; T ^dPWdj u ^ (^)^ji) fa) II W*.)fa) II /%i*)fa)> (36) 

Xi,Xj °i\ x i) b 3\ X 3) s=2 t=2 

where N(i) = ... ,1^-1} and N(j) = {i, ji, . . . , ,74,-1}. A direct computation 

shows that, 

^■■ = M+.+)M->-)- M+.-)M -.±) v( U)eE . (37) 

Equation (1371) implies that |/3y| < 1. By (jSJ), (3%j = if and only if ipij can be factorized 
by some functions as V'ijfaj Xj) = tpi(xi)ipj(xj); no interaction between node i and j. 
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5.2. Relation to the result of Chertkov and Chernyak 

Chertkov and Chernyak [8] show the loop series expansion formula for general vertex 
models and factor graph models, which are more general than pairwise interaction 
models considered in this paper. Focusing on pairwise interaction models, however, 
we found the further relations of the first and secondary messages in theorem [2} which 
derives our representation of r(C). In this section we show that the expansion formula 
given in theorem [3] is equivalent to the result of Chertkov and Chernyak in [8] . Let us 
briefly review their result in the case of pairwise MRF: 

Z = Z B (1 + f (C)), HC) = J] T V II (38) 

ceg ij&Ec iev c 

(1 - mi) *^- 1 + + rriiY^- 1 

Pi{C) — 



2(1 - mfY^)- 1 



Tij = V, bij(xi, Xj)(xi - rrii)(xj - rrij), = - &»( 



It suffices to prove r(C) = f(C) for all generalized loops C. 

By the inductive definition of the polynomials f n , we see that 

fn(x) = \ , 2 , (39) 

Al — A2 

where Ai, A2 are the roots of the quadratic equation A 2 — x\ — 1 =0. Using the definition 
of 7i, direct calculation derives 

Pl {C){2^bdm^)t (C) = W7*)- (40) 
Using (1371) . we see that 

n 3 = A,(2 v / &,(i)6 t (-i))(2^(i)^(-i)). (41) 

Combining fHUj) and (|4ip gives the claim. 

We append two comments on the difference between our approach and theirs. 
First, we derived the expansion from the viewpoint of message passing operation. The 
quantities /3jj and 7, are characterized by propagation of messages. On the other hand, 
Chertkov and Chernyak used covariances and means of the beliefs for the expansion. 
Secondly, we interpreted the recursion relation of f n by transformations of the graphs 
in the proof of theorem [2j though the corresponding relation is not clear in their choice 
of variables and /v The recursion is effectively used for upper bounding the number 
of generalized loops in section 5.4.2. 



5. 3. Ising partition function on a regular graph 

In this section we briefly discuss the connection between the polynomial 8(j3, , ~f) : = 
@({Aj — P}i {li — 7}) an d the partition function of the Ising model on a regular graph 
G. A graph G is called regular if all of the degrees of nodes are the same. We see in 
corollary [1] that 9 can be regarded as a transform of the partition function on the basis 
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of the Bethe approximation, and apply it to the derivation of susceptibility formula. In 
this subsection, we assume 7* = 7, /3y = (3 and G is regular graph of degree d. 
Since we have relations ( |28i) and ( 1371) . we solve bij by (3 and 7 as 



2 




M^i) = i ( 1 + + + THZ ) + TTT^' (42) 



By (|32|) and (1341) . the polynomial 9 admits the following identity: 

W7) = j2U b ^ x ^U bi ^ 1 ' d ( 43 ) 

where is defined by (pEZl) and 6j is defined by ^^=±1^'- Let Z(K,h) = 
J^ Xi exp(/\~ XiXj + hJ2i x i) be a partition function. As a function of y and 2, we 
obtain the following identity. 



9<A 7) = Z(Jf, A) ( " , ^7 ' ' " ) (44) 



Corollary 1. For a regular graph G of degree d, 

■ V / T^ 2 (l + y 2 z) \ 1^1 / ^(l-?/ 2 )(l-y 2 2 2 )^ \v\ 
1 - y 2 ^ 2 J I 2(1 + 7/ V) 

where f3 — (1 — y 2 )z/(l — y 2 z 2 ) and 7 = 2y(l + z)/ \J(1 — y 2 )(l — y 2 z 2 ). Furthermore, 
K = tanh -1 z and 

exp(2 ft )=(i±i0(i±^r. (45) 



.1 - y/ VI - yz. 

Proof. The proof is accomplished by calculations based on (143p . Let bij(xi,Xj) = 
exp(KxiXj + ft/xj + ft'o^ + C) and 6i(xj) = exp(ft"xj + ZJ). We define z = tannic and 
y = tanh ft/. By ( j4"2l . (3, 7 and C are solved by y and z. The condition 6j = X)x-=±i % 
determines ft" and D. Let ft = dft' + (1 — d)ft", then the product of b^ and bi is 
proportional to exp(i\~ x i x j + ^ S« x i)- ^ 

If y = 0, then ft = 0, 7 = and (3 = z. This theorem is reduced to the well 
known high temperature expansion. Therefore this formula is an extension of the high 
temperature expansion of the Ising model with external field. 

We proceed to obtain a formula of zero field susceptibility which is defined by 

X (K) :=^-\og Z(K,h) . (46) 
aft h=o 

By the differentiation of (1431) . we have 

d 

(l + z-dz) 2 X (K) = (l + z)(l + z-dz)+2z(z 2 -1)— log 6{z,0) 

oz 

+ 8(l + z) 2 ^log^(z, 7 ) • (47) 

oY 7=0 

If we substitute 1 for 9, which corresponds to the Bethe approximation, this formula 
reduces to the well known formula of Bethe approximation of susceptibility [16J . Higher 
order approximation can be obtained by enumerating generalized loops that appear in 9. 
Comparison of traditional ways of enumeration of subgraphs [T71 [18] and our expansion 
is an interesting future research topic. 
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5-4- Miscellaneous topics 

5.4-1- Representation ofTk Using the first and secondary messages and {/%}, T^ Xh x - 
defined in ([9]) admits a simple representation. Let k be a leaf node of the tree G and 
N(k) = on the original graph G, and let i Q ,ii, . . . ,ii be the unique path from 

io = k to ii = k on G, where i\ = = i. It is easy to see that TkfJ>(k,i) — an d 

TkV(k,i) = rii=i Pit-ih v {k,i) w hh propagation diagrams on G. Therefore 

i 

T k Xk x - k = H(k,i){ x k)V(k,j){xk) + nf3i t -ii t V(k,i)(xk)v(k,j)(xk)- (48) 

t=i 

This shows that vtk j) and V(k,i) are the left an d right eigenvectors of the matrix and 
their eigenvalue is YLt=i Pit-in- 

5-4-2. The number of generalized loops We first show that the polynomial #(1,7) 
depends on the graph G only through L, the number of linearly independent cycles. 
Since (3 — 1, we can shrink any edges without changing corresponding polynomial 9. 
Any graph with L independent cycles can be reduced to a graph in which only one node 
has degree more than two. See figure (TTJ L rings are joined at one point. Therefore, 
cutting L loops, we obtain 

e ( 1 .7)=E(^)/»(7) 



.7 + ^4 + 72/ V-7+ V 4 + T 2 

This equality shows that the sum of coefficients of 9 is equal to 



0(1,1) =(^J ■ (50) 
Moreover, we obtain a bound for the number of generalized loops. 

Corollary 2. Let C/o fre the set of all generalized loops of G including empty set. Then, 

]gol <(^) L -\(^) L -\ (81) 



77ms bound is attained if and only if every node of a generalized loop has the degree at 
most three. 

Proof. Since f n (l) > 1 for all n > 4 and /2(1) = /s(l) = 1, we have r(C)| i g =7= i > 1 for 
all C (z Go , and the equality holds if and only if di{C) < 3 for all i e V^- This shows 
\Go I < #(1, 1) an d the equality condition. □ 
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6. Marginal expansion formula 

6.1. Derivation of the marginal expansion formula 

We show relations between the approximated marginal h}~ and the true marginal pk- 
In this section, we take compatibility functions as (1511) once again. Without loss of 
generality, we assume k = 1 and N(l) = {i,j}, i.e. node 1 has degree two. Indeed, split 
of nodes as in figure [3] gives the same marginal probability to the added nodes as the 
original one. Because 



1 





1 



KM = A*W> = {^^j , "tM> = = { 
we see the following equations by a direct computation: 

= &i(+)^(i,i)/ i fi,j) + h(-)v(i,i)vfi,j) 

-V /& i(+) & i(-)^(M) z/ ( T ij) - V /& i(+) & i(-) zy (M)/ i fi,j) 5 ( 52 ) 
: 6i(-)^(i,i)^fij) + 

+ V /& i(+) fo i(-)/ i (M) zy a,i) + V /fe i(+) & i(-) iy (i,i)^fi ) j)- (53) 
By the definition of marginal probability, we see that 

-?- Pl (±) = J2 T i Xl x-M=*i=±h (54) 

where T^ 1 „ := „ T X1 '"-' XL „ and / is the indicator function. Using these 

equations, we can show the following theorem. 

Theorem 4. 

^Pi(±) = &i(±) ^^^^^^(^O^cuj^i)! + & i(t) f^^^^C^O^uj^oj 

wm+m-) ( j^^r 1 XI M(i,i)(^i)^i,j)( x i) + 5Z T i /xl x^LofaiW^fai) ) • 



(55) 



TTie /ow summation terms appeared in 153]) can 6e expanded with propagation diagrams. 

This expansion is computationally intractable if L is large, in a similar way to 
the partition function Z. For relatively small graphs, however, we may be able to 
expand these terms. The terms in the expansion of the first two summations are 
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A_. • '- ^ (1^. - A 





= 7i (1 -/3i2/3 2 3/3i 3 ) -2(0 + 0) 
Figure 12. An example of expansion with propagation diagrams in one loop cases. 

labelled by the generalized loops, while the other terms in expansion of the last two 
summations are labelled by other subgraphs: each subgraph does not have nodes of 
degree one except the node 1. The expansion may be heuristically used for approximate 
computation of marginal probability distributions, namely we can correct beliefs using 
terms corresponding to major subgraphs in expanded representation of fl53|) . 
With this theorem, an already known fact is easily deduced [19|. 

Corollary 3. Letting L=l and node 1 is on the unique cycle in G, pi(+) — Pi(— ) and 
&i(+) — &i(— ) have the same sign. 



Proof. By theorem HJ 
Z Pi(+)-Pi(-) 



Xl Xl 



(56) 



z B VhiTjhH 

^ Xi xi ' 

The right hand side can be expanded with propagation diagrams as in figure O The 
first summation is 1, the second is a product of (3, and the third and fourth is equal to 
0. Since \(3ij\ < 1 the result follows. □ 

A problem of finding an assignment that maximize the marginal probability p\ 
is called maximum marginal assignment problem [T9]. This corollary asserts that the 
assignment that maximize the belief b\ is the solution of this problem. 

6.2. Example 

Consider a graph in figure El In this case theorem H] turns out to be 

^l(±) = &l(±) (i + ftsftAj + 6l(T)/9l4/3l3 (fa + fafafoslsli) 



T\/ & l( + ) & l(-) /5l3/3 3 4/?23/9 2 473 + /5l4/9 3 4/9 2 3/?2474 • (57) 



Loop series expansion with propagation diagrams 



18 



By (157)) . we have 
Z Pi(+)-Pi(-) 



Z B v /6 1 (+)6 1 (-) 

-2 n?13/M23&473 + PlApU^pT/ilA • (58) 

The right hand side discriminates which state is more plausible. Let ^34 = in (158)) . 
this expression is reduced to the case of L = 1 and consistent with the result of corollary 
[U Let /3i3 = 0. In this case we see that — Pi(— ) does not necessarily have the 

same sign as 71 [19J. 

If, for example, < 1/2, |7 3 |, |7 4 | < 1 and I73I/2, 172I/2 < |7i|, then we see from 
( l58l) that pi(+) —pi{—) and 71 have the same sign. The first condition requires weakness 
of the interactions, and the second condition requires that the beliefs at the nodes of 
degree three are not too much biased. The last condition is satisfied if Pi{+) andpi(— ) 
are not too close to each other. 

If we take variables m ; and r^- in section I5T21 instead of 73 and the expressions 
(!5"7j) and (158)) become more complicated in general, and it is hard to find simple 
conditions. 



7. Concluding remarks 

We introduced propagation diagrams that enable us to compute loop series expansion 
of a partition function and marginal distributions with a set of simple rules. In this 
method, parameters (3ij and ji are naturally assigned to each edge and node. 

Accuracy of the Bethe approximation depends both on the strength of interactions 
and the topology of the underlying graph. The effect of the interactions is captured 
by the values of (3 and 7. The topological aspect of the graph, in the sense of Bethe 
approximation, is extracted in the polynomial 0. 

We suggest future research topics. First, understanding of the structure of the 
polynomial is important to construct efficient approximation algorithms exploiting 
graph topology. The properties of 9 should be investigated further. Secondly, on the 
basis of the results of this paper, it is interesting to understand the empirically known 
fact: if LBP does not converge, the quality of the Bethe approximation is low [20]. Since 
we show a direct relation between the message passing operation and the expansion 
variables (3 and 7, convergence of the LBP algorithm can be analyzed using them. 
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Appendix 

Appendix A.I. Example of expansion 

We consider the graph in figure [2j Normalizing Zg = 1, we can calculate the 
loop expansion of the partition function as in the following figure, using propagation 
diagrams. 

Five terms in the final expression of figure IA1I correspond to the subgraphs in 
figure IA2I respectively. 
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= 1 + P14P34P13 + P23P34P24 

+^13^14/923/324 + /9l3/3l4^23p24^347374 



Figure Al. Expansion of the partition function with the method of propagation 
diagrams. It is not necessary to expand all edges as in the proof of theorem [3J 



4>4><t>4><$> 

Figure A2. Light red parts are generalized loops. 



